Goto

Collaborating Authors

 Civil Rights & Constitutional Law


MMLONGBENCH: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly

Neural Information Processing Systems

The rapid extension of context windows in large vision-language models has given rise to long-context vision-language models (LCVLMs), which are capable of handling hundreds of images with interleaved text tokens in a single forward pass. In this work, we introduce MMLONGBENCH, the first benchmark covering a diverse set of long-context vision-language tasks, to evaluate LCVLMs effectively and thoroughly. MMLONGBENCH is composed of 13,331 examples spanning five different categories of downstream tasks, such as Visual RAG and Many-Shot ICL. It also provides broad coverage of image types, including various natural and synthetic images. To assess the robustness of the models to different input lengths, all examples are delivered at five standardized input lengths (8K-128K tokens) via a cross-modal tokenization scheme that combines vision patches and text tokens. Through a thorough benchmarking of 46 closed-source and open-source LCVLMs, we provide a comprehensive analysis of the current models' vision-language longcontext ability. Our results show that: i) performance on a single task is a weak proxy for overall long-context capability; ii) both closed-source and open-source models face challenges in long-context vision-language tasks, indicating substantial room for future improvement; iii) models with stronger reasoning ability tend to exhibit better long-context performance. By offering wide task coverage, various image types, and rigorous length control, MMLONGBENCH1 provides the missing foundation for diagnosing and advancing the next generation of LCVLMs.


You Can't Separate Juneteenth From the Call for Reparations

TIME - Tech

Follow this section to personalize your feed and get instant alerts. Follow Go to your personalized feed WHY FOLLOW? Smart Alerts: Get notified about major news as it happens. Follow this tag to personalize your feed and get instant alerts. Follow Go to your personalized feed WHY FOLLOW?


Israel kills at least three Palestinians in Gaza City drone strike

Al Jazeera

'This is an apartheid regime' Does Trump have real leverage over Netanyahu? At least three Palestinians have been killed and several others wounded after an Israeli drone struck a vehicle near Abu Khadra Mosque in the Rimal neighbourhood of western Gaza City, according to medical sources. Al Jazeera's Hind Khoudary, reporting from Gaza City, said the attack on Thursday was the first explosion in the area after a few "calm and quiet" days. What to know about Colombia's run-off election "Only one of the three victims has been identified: Abdul Jawad Abu Lebn [who] was set to get married next week. Wedding invitations were found inside the car."


3 Amazon Workers Say They're Under Investigation for Speaking Out About Data Centers

WIRED

The software engineers filed a complaint with Seattle's civil rights office accusing Amazon of illegally retaliating against them for expressing their personal political beliefs. Earlier this month, five current Amazon employees publicly urged Seattle City Council to regulate data centers . It was an unprecedented act of advocacy by tech workers, and now three of the staffers say they are under internal investigation for what they understand to be allegedly representing themselves as spokespeople for the company without prior approval. "It's a totally ridiculous claim," says one of the affected employees, Patrick Schloesser. The three software engineers, who work in different divisions of Amazon and all live in Seattle, believe they are being unfairly targeted for expressing their political beliefs.


The UK Will Scan Asylum-Seekers' Faces for Age Checks--Despite Knowing the Tech Is Flawed

WIRED

The UK Will Scan Asylum-Seekers' Faces for Age Checks--Despite Knowing the Tech Is Flawed Age verification is consuming the internet . From social media bans in Australia to porn restrictions in half of US states, for many having to prove their age to access websites is becoming an everyday requirement . But one of the key technologies underpinning many of these age checks is about to seep into the offline world--with potentially life-changing consequences for people having their age predicted by AI. Starting next year, the British government is planning to introduce facial age estimation--where AI scans your face and suggests how old you are --to help determine the age of asylum seekers arriving at the United Kingdom's border. The move is believed to be the first time that a so-called facial age estimation (FAE) system has been used in this way.


Young Palestinian women learn AI to tell stories of war on Gaza

Al Jazeera

'This is an apartheid regime' Does Trump have real leverage over Netanyahu? Young Palestinian women in Gaza are learning to use artificial intelligence to create short films and tell stories about their life during the war. Trump: 'Very strong' Iran deal is a'wall to a nuclear weapon'


Frank Wills and the Importance of Ordinary Americans Doing the Right Thing

TIME - Tech

Follow this section to personalize your feed and get instant alerts. Follow Go to your personalized feed WHY FOLLOW? Smart Alerts: Get notified about major news as it happens. Follow this tag to personalize your feed and get instant alerts. Follow Go to your personalized feed WHY FOLLOW?


Safety Pretraining: Toward the Next Generation of Safe AI

Neural Information Processing Systems

As large language models (LLMs) are increasingly deployed in high-stakes settings, the risk of generating harmful or toxic content remains a central challenge. Post-hoc alignment methods are brittle: once unsafe patterns are learned during pretraining, they are hard to remove. In this work, we present a data-centric pretraining framework that builds safety into the model from the start. Our framework consists of four key steps: (i) Safety Filtering: building a safety classifier to classify webdata into safe and unsafe categories; (ii) Safety Rephrasing: we recontextualize unsafe webdata into safer narratives; (iii) Native Refusal: we synthetically generate pretraining datasets that actively teach models to refuse on unsafe content and the moral reasoning behind it, and (iv) Harmfulness-Tag annotated pretraining: we flag unsafe content during pretraining using a special token, and use it to steer models away from unsafe generations at inference-time. Our safety-pretrained models reduce attack success rates from 38.8% to 8.4% on standard LLM safety benchmarks with no performance degradation on general tasks.


Why Fines Alone Won't Make Social Media Safer For Kids

TIME - Tech

If courts want to reduce harm, they must focus on product design choices, measurable safety outcomes, and governance, write Peter Chapman, Ravi Iyer, and Meetali Jain.


Actor Wilson Cruz Is Still Standing Up for Queer Students

TIME - Tech

Follow this author to personalize your feed and get instant alerts. Follow Go to your personalized feed WHY FOLLOW? Smart Alerts: Get notified about major news as it happens. Carlin is a contributor for TIME. In 1994, at 20 years old, Wilson Cruz became the first out gay actor to play an out lead character on U.S. primetime television: 15-year-old Enrique "Rickie" Vasquez on the ABC teen "That was the power of Rickie Vasquez--he was going to allow people to see themselves for the first time," he says.